Survival regression takes the linear combination and uses it to predict survival. But survival

data presents some special challenges:

Censoring: Censoring happens when the event doesn’t occur during the observation time of the

study (which, in human studies, means during follow-up). Before considering using survival

regression on your data, you need to evaluate the impact censoring may have on the results. You

can do this using life tables, the Kaplan-Meier method, and the log-rank test, as described in

Chapters 21 and 22.

Survival curve shapes: Some business disciplines develop models for estimating time to failure

of mechanical or electronic devices. They estimate the times to certain kinds of events, like a

computer’s motherboard wearing out or the transmission of a car going kaput, and find that they

follow remarkably predictable shapes or distributions (the most common being the Weibull

distribution, covered in Chapter 24). Because of this, these disciplines often use a parametric

form of survival regression, which assumes that you can represent the survival curves by algebraic

formulas. Unfortunately for biostatisticians, biological data tends to produce nonparametric

survival curves whose distributions can’t be represented by these parametric distributions.

As described earlier, nonparametric survival analyses using life tables, Kaplan-Meier plots, and log-

rank tests are limiting. But as biostatisticians, we could not rely on using parametric distributions in

our models; we wanted to use a hybrid, semi-parametric kind of survival regression. We wanted one

that was partly nonparametric, meaning it didn’t assume any mathematical formula for the shape of the

overall survival curve, and partly parametric, meaning we could use some parameter (or predicted

survival distribution shape) to guide our formulas the way other industries used the Weibull

distribution. In 1972, a statistician named David Cox developed a workable method for doing this. The

procedure is now called Cox proportional hazards regression, which we call PH regression for the

rest of this chapter for brevity. In the following sections, we outline the steps of performing a PH

regression.

Since 1972, many issues have been identified when using survival regression for biological

data, especially with respect to its appropriateness for the type of data. One way to examine this

is by running a logistic regression model (see Chapter 18) with the same predictors and outcome

as your survival regression model without including the time variable, and seeing if the

interpretation changes.

The steps to perform a PH regression

You can understand PH regression in terms of several conceptual steps, although when using statistical

software like is described in Chapter 4, it may appear that these steps take place simultaneously. That

is because the output created is designed for you — the biostatistician — to walk through the

following steps in your mind and make decisions. You must use the output to:

1. Determine the shape of the overall survival curve produced from the Kaplan-Meier method.